The problem of percentile rank scores used with small reference sets

نویسنده

  • Lutz Bornmann
چکیده

Dear Sir, Instead of a relative mean citation rate, a percentile rank score (PRS) can be used in bibliometrics to generate a normalized citation impact for a paper. The use of a PRS is very advantageous, as no assumptions have to be made as to the distribution of citations in the reference set; that is, the scores are applicable also for the (usually) rightskewed distributions. To calculate the PRS for a single paper, a publication set (the reference set) must be compiled from a database (e.g., Thomson Reuters’ Web of Science) published with the same document type, in the same subject category (or journal), and in the same publication year. To calculate the PRS, two steps are necessary: First, all publications in the set are ranked in increasing order X1 X2 . . . Xn where X1 (Xn) denotes the number of citations received by the least (most) cited publication. Second, each publication is assigned a PRS based on this distribution. If, for example, a single publication had 50 citations, and this citation count was equal to or greater than the citation counts of 90% of all publications, then the PRS of this publication would be at best 90. The publication would be in the 90th percentile (or higher). Because all publications in the reference set are attributed to a PRS, the PRS is also assigned to the publication in question. This PRS shows the publication’s normalized citation impact relative to the papers with the same document type and published in the same subject category (or journal) and year. In response to the paper of Leydesdorff, Bornmann, Mutz, and Opthof (2011) several methods have been proposed to calculate the PRS, with the intention of finding a solution for the problem of small publication sets (Rousseau, 2012; Schreiber, 2012). For example, a reference set may contain a limited number of, say, ten papers. In that case, the highest possible PRS would be based on (9/10) or 90%. Rousseau (2012) proposes to define this highest possible rank as 100% by including the ranked paper in the ranking, and thus to consider (10/10) as the highest possible rank (see Leydesdorff, in press). The calculation of PRS for small data sets is a wellknown problem in statistics and several solutions have been proposed (Hyndman & Fan, 1996; Sheskin, 2007). The calculation of PRS using linear interpolation is integrated into statistical software packages (e.g., it is part of the EXAMINE procedure of SPSS, of the R-function QUANTILE, and of the CENTILE command of Stata). However, I would like to point out that reliable results in the calculation of PRS can only be reached if reference sets are used that are based on a larger publication set (journal sets or larger subject categories instead of single journals at best). Also, by using large sets the differences between the various methods for the proposed PRS calculation might be of little or no practical consequence. Small reference sets lead to unreliable performance estimations for single publications. Although the different proposals for dealing with small reference sets appear to be interesting solutions for this problem, the missing reliability produced by the low publication number remains in each case. Thus, I recommend a use of PRS, which is oriented towards reference sets including sufficient publication numbers, instead of mathematical solutions for dealing with small paper numbers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Which percentile-based approach should be preferred for calculating normalized citation impact values? An empirical comparison of five approaches including a newly developed citation-rank approach (P100)

s, and editorial materials. Taken as a whole, we only excluded 7.7% (n=19) subject categories (at the subject category level) and 24.1% (151,272) papers (at the paper level). As a 2 Notes were removed from the database as a document type in 1997, but they were citable items in 1980. 12 result, we kept 475,391 papers for the analysis, and the annual citation counts (from 1980 to 2010) of these p...

متن کامل

Spatial Design for Knot Selection in Knot-Based Low-Rank Models

‎Analysis of large geostatistical data sets‎, ‎usually‎, ‎entail the expensive matrix computations‎. ‎This problem creates challenges in implementing statistical inferences of traditional Bayesian models‎. ‎In addition,researchers often face with multiple spatial data sets with complex spatial dependence structures that their analysis is difficult‎. ‎This is a problem for MCMC sampling algorith...

متن کامل

Application of Network RTK Positions and Geometric Constraints to the Problem of Attitude Determination Using the GPS Carrier Phase Measurements

Nowadays, navigation is an unavoidable fact in military and civil aerial transportations. The Global Positioning System (GPS) is commonly used for computing the orientation or attitude of a moving platform. The relative positions of the GPS antennas are computed using the GPS code and/or phase measurements. To achieve a precise attitude determination, Carrier phase observations of GPS requiring...

متن کامل

تاخیر در رشد یک مسئله پیچیده برای پرستاری بهداشت جامعه

  Juanita Hernandez is an 18-month-old Hispanic, the younger of two children. Juanita was at the 25th percentile for height and weight at birth, but both measurements fell to just below the 5th percentile during her first year. For the last 6 months however, her rate of growth has been normal and her height and weight curves lie parallel to just below the 5th percentile. Juanita appears quite s...

متن کامل

Reference Values for Serum Total Cholesterol Concentrations Using Percentile Regression Model: A Population Study in Mashhad

Background and Purpose: Serum total cholesterol (TC) concentrations are affected by several factors including ethnicity, diet, geographic, and environmental determinants, and are related to another disease, including hypothyroidism, and renal and liver disease. It is associated with an increased risk of cardiovascular disease, particularly if associated with high levels of serum low-density lip...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • JASIST

دوره 64  شماره 

صفحات  -

تاریخ انتشار 2013